An Autoencoder Approach to Learning Bilingual Word Representations

نویسندگان

A. P. Sarath Chandar

Stanislas Lauly

Hugo Larochelle

Mitesh M. Khapra

Balaraman Ravindran

Vikas C. Raykar

Amrita Saha

چکیده

Cross-language learning allows one to use training data from one language to build models for a different language. Many approaches to bilingual learning require that we have word-level alignment of sentences from parallel corpora. In this work we explore the use of autoencoder-based methods for cross-language learning of vectorial word representations that are coherent between two languages, while not relying on word-level alignments. We show that by simply learning to reconstruct the bag-of-words representations of aligned sentences, within and between languages, we can in fact learn high-quality representations and do without word alignments. We empirically investigate the success of our approach on the problem of cross-language text classification, where a classifier trained on a given language (e.g., English) must learn to generalize to a different language (e.g., German). In experiments on 3 language pairs, we show that our approach achieves state-of-the-art performance, outperforming a method exploiting word alignments and a strong machine translation baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders

We present an approach to learning multi-sense word embeddings relying both on monolingual and bilingual information. Our model consists of an encoder, which uses monolingual and bilingual context (i.e. a parallel sentence) to choose a sense for a given word, and a decoder which predicts context words based on the chosen sense. The two components are estimated jointly. We observe that the word ...

متن کامل

Transduction Recursive Auto-Associative Memory: Learning Bilingual Compositional Distributed Vector Representations of Inversion Transduction Grammars

We introduce TRAAM, or Transduction RAAM, a fully bilingual generalization of Pollack’s (1990) monolingual Recursive Auto-Associative Memory neural network model, in which each distributed vector represents a bilingual constituent—i.e., an instance of a transduction rule, which specifies a relation between two monolingual constituents and how their subconstituents should be permuted. Bilingual ...

متن کامل

Bilingual Correspondence Recursive Autoencoder for Statistical Machine Translation

Learning semantic representations and tree structures of bilingual phrases is beneficial for statistical machine translation. In this paper, we propose a new neural network model called Bilingual Correspondence Recursive Autoencoder (BCorrRAE) to model bilingual phrases in translation. We incorporate word alignments into BCorrRAE to allow it freely access bilingual constraints at different leve...

متن کامل

Bilingual Autoencoders with Global Descriptors for Modeling Parallel Sentences

Parallel sentence representations are important for bilingual and cross-lingual tasks in natural language processing. In this paper, we explore a bilingual autoencoder approach to model parallel sentences. We extract sentence-level global descriptors (e.g. min, max) from word embeddings, and construct two monolingual autoencoders over these descriptors on the source and target language. In orde...

متن کامل

Learning Multilingual Word Representations using a Bag-of-Words Autoencoder

Recent work on learning multilingual word representations usually relies on the use of word-level alignements (e.g. infered with the help of GIZA++) between translated sentences, in order to align the word embeddings in different languages. In this workshop paper, we investigate an autoencoder model for learning multilingual word representations that does without such word-level alignements. Th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

An Autoencoder Approach to Learning Bilingual Word Representations

نویسندگان

چکیده

منابع مشابه

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders

Transduction Recursive Auto-Associative Memory: Learning Bilingual Compositional Distributed Vector Representations of Inversion Transduction Grammars

Bilingual Correspondence Recursive Autoencoder for Statistical Machine Translation

Bilingual Autoencoders with Global Descriptors for Modeling Parallel Sentences

Learning Multilingual Word Representations using a Bag-of-Words Autoencoder

عنوان ژورنال:

اشتراک گذاری